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Abstract 

Image texture is useful in image browsing, search and retrieval. A texture descriptor based on a multiresolution 
decomposition using Gabor wavelets is proposed. The descriptor consists of two parts: a perceptual browsing component 
(PBC) and a similarity retrieval component (SRC). The extraction methods of both PBC and SRC are based on 
a multiresolution decomposition using Gabor wavelets. PBC provides a quantitative characterization of the texture's 
structuredness and directionality for browsing application, and the SRC characterizes the distribution of texture energy 
in different subbands, and supports similarity retrieval. This representation is quite robust to illumination variations and 
compares favorably with other texture descriptors for similarity retrieval. Experimental results are provided. © 2000 
Elsevier Science B.V. All rights reserved. , 
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1. Introduction 

The recent advances in digital imaging and com- 
puting technology have resulted in a rapid accumu- 
lation of digital media in the personal computing 
and entertainment industry. In addition, large col- 
lections of such data already exist in many scientific 
application domains such as the geographic in- 
formation systems (GIS) and medical imaging. 
Managing large collections of multimedia data 
requires development of new tools and technolo- 
gies. This is evident in the current MPEG-7 stan- 
dardization effort whose objective is to provide a set 
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of standardized tools to describe the multimedia 
content [9,15,16]. 

At the core of the MPEG-7 is a set of descriptors 
for audio-visual content. In [16] a descriptor is 
defined as a representation of a feature. A descrip- 
tor defines the syntax and semantics of the feature 
representation. Examples of low-level visual fea- 
tures include color, shape, motion, and texture. 
This paper describes a texture feature descriptor 
that is being proposed to the MPEG-7 standard 
[18]. Key functionalities supported by this descrip- 
tor include image browsing and similarity-based 
retrieval. 

Image texture has emerged as an important vis- 
ual primitive to search and browse through large 
collections of similar looking patterns. An image 
can be considered as a mosaic of textures and 
texture features associated with the regions can be 
used to index the image data. For instance, a user 
browsing an aerial image database may want to 
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identify all parking lots in the image collection. 
A parking lot with cars parked at regular intervals 
is an excellent example of a textured pattern when 
viewed from a distance, such as in an airphoto. 
Similarly, agricultural areas and vegetation patches 
are other examples of textures commonly found in 
aerial and satellite imagery. Examples of queries 
that could be supported in this context could in- 
clude "Retrieve all Landsat images of Santa Bar- 
bara which have less than 20% cloud cover" or 
"Find a vegetation patch that looks like this 
region". To support image retrieval or browsing, an 
effective representation of textures is required. 

One of the widely used representations of tex- 
tures is the texture feature proposed in [17] and its 
improved version in [6]. The texture feature used in 
[17,6] is based, to some extent, on models of human 
texture perception. More recently, several random- 
field-based texture models [10,14] and multiscale 
filtering methods [3,13] have been studied. Use of 
texture for content-based retrieval has been ex- 
plored by several researchers [6,11,12]. Among 
these, features computed from Gabor filtered 
images appear quite promising. A comprehensive 
evaluation of using Gabor features can be found in 
[11,13]. More recent evaluation and comparison 
using other texture features also support the obser- 
vation that the orientation and scale-selective Gabor 
filtered images capture relevant texture properties 
for applications such as image retrieval [8]. 

The proposed texture descriptor is based on 
Gabor filtering [11,13]. The descriptor has two 
parts: The first part relates to a perceptual charac- 
terization of texture in terms of structuredness, 
directionality and coarseness (scale). This repres- 
entation is useful for browsing type applications 
and coarse classification of textures. We call this 
part the perceptual browsing component (PBC). 
The second part provides a quantitative description 
that can be used for accurate search and retrieval. 
This is referred to as the similarity retrieval com- 
ponent (SRC). The SRC component is described in 
detail in an earlier paper [13]. Both of the compo- 
nents are derived from a multiresolution Gabor 
filtering. Key features of this descriptor are 

• It captures both the high-level perceptual 
characterization (in terms of directionality, 



structuredness, and coarseness of a texture), as 
well as a robust quantitative characterization at 
multiple scales and orientations. 

• Feature extraction is simple, involving image 
convolutions with a set of masks. The filters are 
based on a 2-D Gabor wavelet decomposition. 
Image convolutions can be efficiently imple- 
mented in hardware and software. 

• Multiple applications can be supported by the 
descriptor. For example, by using PBC, brows- 
ing of image database could be performed (e.g., 
show textures that are structured and are oriented 
at 90°). The SRC can be used for query by 
example type applications wherein similarity re- 
trieval is needed. 

The paper is organized as follows. The next section 
provides a brief introduction to Gabor filters. Com- 
puting the PBC is described in Section 3 and 
Section 4 details SRC computation. Experimental 
results are provided in Section 5. Section 6 con- 
cludes with discussions. 



2. Gabor filter bank [13] 

The use of Gabor filters in extracting texture 
descriptors is motivated by several factors. The 
Gabor representation has been shown to be opti- 
mal in the sense of minimizing the joint two-dimen- 
sional uncertainty in space and frequency [4]. 
These filters can be considered as orientation and 
scale tunable edge and line detectors, and the statis- 
tics of these micro features can be used to charac- 
terize the underlying texture. 

A two-dimensional Gabor function and its 
Fourier transform can be written as 

9 ^ ^ ^2na x a y ) CXP [ 2 (er^ + a] ) 



+ InjWx 



(1) 



(2) 



where a u = \)2na x and a v = l/2nc y . A class of 
self-similar functions, refered to as the Gabor 
wavelets, is now considered. Let g(x, y) be the 
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mother wavelet Then a self-similar filter dictionary 
can be obtained by appropriate dilations and trans- 
lations of g(x, y) through the generation function 
[13]: 

g mn (x, y) = a' m g(x\ y% a > 1, m, n - integer 
x' = a~ m (x cos 8 4- y sin 9) and 



y* = a '"( — x sin 9 + y cos 9\ 



(3) 



where 8 = mr/K and K is the total number of ori- 
entations. The scale factor a~ m in (3) is meant to 
ensure that the energy is independent of m. This set 
of functions form a non-orthogonal basis of func- 
tions for the multiresolution decomposition [13]. 

The non-orthogonality of the Gabor wavelets 
implies that there is redundant information in the 
filtered images, and the following strategy is used to 
reduce this redundancy. Let U x and U h denote the 
lower and upper center frequencies of interest. Let 
K be the number of orientations and 5 be the 
number of scales in the multiresolution decomposi- 
tion. Then the design strategy is to ensure that the 
half-peak magnitude supports of the filter re- 
sponses in the frequency spectrum touch each other 
as shown in Fig. 1. This results in the following 
formulas for computing the filter parameters o u and 
a v (and thus o x and 



a y ) [13]. 



a = (UJU l ?' iS ~ i \ a u = 



t/ h -21n 2(^ 



(fl - W- 
(a + l)y/2 In 2 



a. = tan - 



2 In 2- 



(2 In 2) 2 <r u 2 



■1/2 



(4) 



where W =^ U h and m = 0, 1, ... , S — 1. In order to 
eliminate sensitivity of the filter response to abso- 
lute intensity values, the real (even) components of 
the 2-D Gabor filters are biased by adding a con- 
stant to make them zero mean (This can also be 
done by setting G(0, 0) in (2) to zero.) Filtering the 
image I(x, y) with g mn (x, y) results in 



W na (x 9 y) = 



/(*, y)gL(x -x u y-y l )dx i dy, , 



(5) 




Fig. 1 . The contours indicate the half-peak magnitude of the 
filter responses in the Gabor filter dictionary. The filter para- 
meters used are V h - 0.04, (/, - 0.05, JC = 6 and 5 = 4 [6]. 



3. Perceptual browsing component (PBC) 

From the multiresolution decomposition, a given 
image is decomposed into a set of filtered images. 
Each of these images represents the image informa- 
tion at a certain scale and at a certain orientation. 
The PBC captures the regularity (or the lack of it) 
in the texture pattern. Its computation is based on 
the following observations: 

• Structured textures usually consist of dominant 
periodic patterns. 

• A periodic or repetitive pattern, if it exists, could 
be captured by the filtered images. This behavior 
is usually captured in more than one filtered 
output. 

• The dominant scale and orientation information 
can also be captured by analyzing projections of 
the filtered images. 

Based on the above observations, we propose the 
following format for the PBC: 



PBC = [tfi v 2 v 3 v 4 u 5 ]. 



(6) 



where * indicates the complex conjugate. 



• Regularity (v A ): ^ represents the degree of regu- 
larity or structuredness of the texture. A larger 
value of i^! indicates a more regular pattern. 
Consider the two patterns in Fig. 2. Pattern 
Fig. 2(a) is intuitively more "regular'- than Fig. 2(b), 
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Fig. 2. Two examples of regularity of textures (a) regular pattern 
(b) irregular pattern. 



and hence should have a larger v t compared to; 
_ Fig. 2(b) . 

• Directionality (v 2 , v$ ): These represent the two 
dominant orientations of the texture. The accu- 
racy of computing these two components often 
depends on the level of regularity of the texture 
pattern. In our implementation, the orientation 
space is divided into 30° intervals. 

• Scale (v 4 ,v 5 ): These represent two dominant 
scales of the texture. Similar to directionality, the 
more structured the texture, the more robust the 
computation of these two components. 

The PBC computation is a two step procedure. 
The first step is the analysis of each filtered 
output. The objective of this step is to determine 
the existence of a repetitive pattern. The 
second step is performed on all filtered outputs 
that are identified as having some kind of 
regularity. 

3.1. Analysis of each filtered image and 
candidate selection 

To identify if a filtered image is repetitive or not, 
the projections of each filtered image is computed 
and analyzed. The regular projections would be 
identified and further grouped to find dominant 
regularity of projections. The detail of the analysis 
is given below step by step. 

Projection: For each filtered image, the projec- 
tions along horizontal and vertical directions are 
computed. For an N x N image, the horizontal 
projection P H and vertical projection P v are 



defined as 

PTKl)=^jt W mn (Kk) and 



(7) 



where /, k = 1, . . . , N, W mn (l, k) represents the 
(m, n)th filtered output. For simplicity in notation, 
we drop the index (m, n) and the subscripts (H and 
V) in the following discussion. 

Autocorrelation: Consider now a projection P(l). 
The normalized autocorrelation function (NAC) is 
defined as 



NAC(k) 



ULZlPjm - k)P(m) 



(8) 



Fig. 3 shows the horizontal projections of texture 
pattern (a) in Fig. 2. 

Peak detection: The local peaks and valleys of the 
NAC(k) are then identified. For the detected peaks 
and valleys, their position and magnitude are 
recorded. Let M be the number of peaks and N 
be the number of valleys. Let p-posi(i\ p-magn(i) 
(i = 1, 2, . . . , M) be the positions and magnitudes of 
these peak points, respectively, and let v~posi(j), 
v-magn(j) (j = 1, 2, ... , N) be the positions and 
magnitudes of the valley points, respectively. The 
contrast of the projection is then defined to be 

I M j N 

contrast = — Y p-tnagn{i) - — £ v-magn(i). (9) 
Mm N j=l 

Peak Analysis: Given a peak sequence p-posi(i) 
including all the peaks detected form a projection 
and the number of peaks is M, the average of the 
distances among the successive peaks, dis, and the 
square root of the standard deviation of distances, 
std are computed. Let 



std 
dis 



(10) 



A lower variance in the distances between peaks 
implies a more "consistent" repetitive pattern. 
A threshold can then be set to distinguish between 
regular and irregular patterns. If y is smaller than 
a pre-selected threshold T p , the corresponding pro- 
jection is considered to represent a repetitive or 
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Fig. 3. NACoi horizontal projections of all the 4 x 6 filtered images from image T001.0K The projections labeled with V are the detected 
potential candidates and those also labeled with * + ' are the final candidates after clustering. 



regular pattern. Those projections that pass this 
threshold are then checked for consistency. 
A simple agglomerative clustering [5] in the two- 
dimensional std-dis space is then used to remove 
the outliers. 

Fig. 3 shows the NAC of the 24 horizontal projec- 
tions for the image T00L01 (shown in Fig. 2(a)). 
The projections marked with "*" are the ones that 
pass the threshold test. Fig. 4(a) shows the distribu- 
tion of std-dis of these potential candidates. 
Fig. 4(b) shows the results after the clustering. 
Those projections that pass the consistency check 
are marked with a " + " in Fig. 3. A similar 



analysis is performed on the vertical projection 
as well. 

From those projections that passed the consist- 
ency check, we identify the ones with the maximum 
contrast. Let (m*(H), n*(H)) denote the scale and 
orientation indices, respectively, of the horizontal 
projection with the maximum contrast. Similarly, 
let (m*(K), n*(V)) denote the scale and orientation, 
respectively, of the vertical projection with max- 
imum contrast. Then, we have 

PBCM = m*(tf) and PBC|> 2 ] = n*(H) 9 

PBC[v 5 ] = m*(V) and PBC[» 3 ] = n*(V). 



6 
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30 32 34 36 38 



Fig. 4. Clustering of potential candidates: the left figure shows the distribution of potential candidates from the projections shown in 
Fig. 3 and the right one shows the final candidates after clustering. 



^ J. 2. Moniputing^ ]) 

The method of measuring the degree of the struc- 
turedness is based on the following observations on 
the distribution of candidate vectors. 

• For strong structured textures, their periodicity 
could be captured by multiple projections - the 
candidates chosen from the above procedure. 
Typically, these candidates are neighbors in the 
scale-orientation space. 

• If the texture is not structured or only weakly 
structured, the distribution of the candidates, if 
they exist, is usually sparse and the neighboring 
relationship can rarely be detected. 

If such a consistency in the neighboring projections is 
detected from the projections in the candidate set, this 
would result in a larger credit, indicating a stronger 
structuredness. Based on these observations, the can- 
didate projections are further classified as follows: 

5.27. Candidate classification 

C x : For a specific candidate, we can find at least 
one other candidate at its neighboring scale or 
orientation. The value associated with this class is 
V x = 1.0. 

C 2 : For a specific candidate, we can find at least 
one another candidate distributed at the same scale 



or orientation, but no candidate is located at its 
neighboring scale or orientation. The value asso- 
ciated with this class is V 2 = 0.5. 

C 3 : The candidate is the only one distributed at 
its scale and orientation. The value associated with 
this class is K 3 = 0.2. 

At this stage, each of the candidate projections 
has an associated value computed based on the 
above classification. Let 

M= £ Ni*Vi, (11) 

i — 1 

where N, is the number of candidate projections 
classified as C ; . M is calculated for the horizontal 
(M H ) and vertical (M v ) projections. Let 

M img =M H +M V (12) 

M img is quantized into N v bins, by using option 
decision tree classifier [2]. The larger the value of 
M jmg is, the more structured the corresponding 
texture is. In our current implementation, N v = 4. 
Consequently, each image is associated with 
a number £ img , B img e {1, N v }, to indicate which 
bin an image belongs to. 

PBClv l -]=B img . 
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4. Extraction of similarity retrieval component 
(SRC) 

4.1. Computing the similarity retrieval component 
(SRC) 

The mean ji,„„ and the standard deviation a mn of 
the magnitude of the transform coefficients are used 
to form the SRC: 



fimn = \W mn (x, y)\ dx dy and 



0mn = s/Sl{\W m „{x 9 y)\ - ft mn ) 2 dx dy. 



(13) 



The similarity retrieval component (SRC) vector is 
now constructed using j.i mn and o mn . For S scales 
and K orientations, this results in a vector 

SRC = On ff,, ... Psk <*sk]- 

Note the double index on the vector elements. In 
the experiment, we use four scales S = 4 and six 
orientations K = 6, resulting in a feature vector 

SRC = [/i,, tr n -.-/U6^4o]- (14) 

4.2. Distance measure for similarity retrieval 
component (SRC) 

To perform the similarity retrieval, a distance 
measure is defined on the proposed feature vector. 



Consider two image patterns i and j. Then the 
distance between the two patterns is defined to be 



m n 

where 



dmn(hj) = 



u {i) 

t l mn 



+ 



a(cr mn ) 



(15) 



(16) 



a(/f m „) and (x(c mn ) are the standard deviations of 
the respective features over the entire database, 
and are used to normalize the individual feature 
components. 



5. Experiment results 

5.7. Browsing using PBC 

The parameters values used in the experiments 
are: {/, = 0.04, £/ h = 0.5, S = 4, K = 6 (in Eqs. (3) 
and (4)) and N v = 4. Thus, the resulting Gabor 
filter set has six orientations (30° intervals) and four 
scales. 

The PBC vectors for some of the Brodatz texture 
images [1] are shown in Figs. 5 and 6. The size of 
the images in the original Brodatz album is 
512 x 512. For evaluation purpose, each 512 x 512 







T001: [414 3 3] 



T006:[4 1 4 4 4] 



T014:[4 14 4 4] 



T020:[4 2 3 3 2] 




T095:[4 14 3 3] 



Fig. 5. PBC of some Brodatz textures. 
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[4 1 433] 



[4 1 433] [4 1 433] [4 1 433] [4 1 433] [4 1 433] [41 433] [4 1 433] 




Illlli- * 

[41 444] [41444] [41 444] [41 444] [41 444] [41444] [41444] [41444] 

(b) 

Fig. 6. Browsing example: patterns having similar PBC to the query pattern (on the left). The PBC values are shown below each texture. 



image is divided into four 256 x 256 subimages. 
Each of the images shown in Fig. 5 is just one of the 
four subimages of each texture image. The FBC[ui] 
has values between 1 and 4 (N v = 4). It could be 
observed that for the structured images, the estimated 
directions and scales match the perceived images very 
well. But the scale and direction estimates are not 
very reliable for textures with low values for PBC[vi ]. 

The PBC computations are subjectively evaluated 
as follows. The 30 texture images from Fig. 5 were 
shown to five different individuals. They were asked 
to quantify the texture structuredness, directionality 
and scale on the same scale as our PBC computa- 
tion. The median values of each of the components 
are used for comparing with the PBC values com- 
puted by our method. For the computer-generated 
PBC values, we use the median of the values from 
the four sub-images of each texture. 

For the structuredness component PBC[v x \ the 
computer and human generated values are within 
one value deviation for 28 of 30 images. If we 
consider values greater than or equal to 2 as repres- 
enting the structured texture, the computed PBC 
values result in 17 structured and 13 non-structured 
textures. This is in good agreement with the human 
observers who agree with 16/17 (structured) and 
12/13 (non-structured). 

The computed dominant directions are also in 
good agreement with the human observers for the 
textures rated as structured. In 12 out of 16, the 
results are in complete agreement. It is observed 



that if a texture has horizontal and vertical pat- 
terns, the algorithm would pick up the correspond- 
ing diagonals as the directions. For the dominant 
scales, the human subjects had difficulty rating 
the textures on a scale of 4 and provided only one 
dominant scale for each pattern. It would have 
been more convenient, perhaps, to use the three 
scales - fine, medium and course - for the subjective 
tests. For the structured textures, the subjective and 
computed values for the first dominant scale were 
in agreement within one value deviation. Our pro- 
posed method did quite well in identifying scales for 
textures that had pattern at two significantly differ- 
ent scales. See, for example, T053 and T055 in 
Fig. 5, which contain pattern at different scales. 

5.2. Similarity retrieval using SRC 

In [13] we provided a comprehensive compari- 
son with other state-of-art texture descriptors. The 
Brodatz texture album [1] is used in those experi- 
ments. This includes two descriptors based on 
orthogonal wavelets, SRC and [3], and one based 
on multiresolution simultaneous autoregressive 
model (MR-SAR) [14]. The SRC compares quite 
favorably with those other texture descriptors. The 
main observations from [13] are: 

• In general, feature components corresponding to 
higher frequencies have better discriminat- 
ing performance. However, decomposing the 
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high-frequency bands further in the tree-struc- 
tured wavelet representation of [3] often leads to 
a decrease in performance, indicating that these 
features are not very robust. 

• Experiments with different orthogonal wavelet 
transforms indicate very little variation in perfor- 
mance with respect to the choice of filters. 

• The marginal improvement of the tree structured 
wavelet features comes at the expense of having 



9 

a much larger feature vector, which adds to the 
overhead associated with indexing and searching. 
• It is important to explore different similarity 
measures for each of the different sets of features. 
For example, using the Mahalanobis distance 
instead of the Euclidean distance improved the 
performance from 64% to 73% for the MR-SAR 
features. Normalized Euclidean distance worked 
better for all the others. 




(a) 










(b) 



mm 



Fig. 7. Similarity retrieval using SRC on an airphoto database: (a) the region retrievals from areas containing some buildings; (b) an 
example of retrieving a part of the runway of an airport; and (c) retrievals containing an image identification number. 
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• For Brodatz images, the best results using the 
Gabor features were obtained using four scales 
and six orientations within each scale. 

In [11], we provided an application to search and 
retrieve of aerial photographs using the SRC de- 
scriptor. Some retrieval examples on the airphoto 
database are shown in Fig. 7. 

6. Discussions 

We have presented a texture descriptor for 
browsing and similarity retrieval applications. 
A comprehensive evaluation of its performance in 
similarity retrieval is given in [13]. The browsing 
component extends its functionality, and enables 
coarse level classification of the database. 

In the UCSB digital library project, the descriptor 
is used to facilitate query by example in a large aerial 
photograph database. The proposed texture descriptor 
provides a robust representation of many geographi- 
cally salient features such as housing developments, 
parking lots, highways, airports, and agricultural 
regions. Details of this work can be found in [11]. 

The proposed descriptor has been used in other 
application domains as well. For example, in [8], 
researchers from IBM have reported applying this 
texture descriptor to an image database related to 
petroleum exploration. They concluded that the 
Gabor feature set outperforms other texture fea- 
tures (computed using the quadratic-mirror filter, 
the discrete cosine transform, and the orthogonal 
wavelet transform) by a wide margin on their 
benchmark dataset. This is consistent with our 
earlier observation. 

7. Uncited Reference 
[7] 
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